Learning Phrase-Based Spelling Error Models from Clickthrough Data

نویسندگان

  • Xu Sun
  • Jianfeng Gao
  • Daniel Micol
  • Chris Quirk
چکیده

This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms significantly its baseline systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Rank with Attentive Media Attributes

In the context of media search engines where assets have small textual data available, we explore several models that improve the learning to rank use cases. In particular, we propose a model with an attention mechanism that leverages phrase-based attributes to guide the importance of other keyword-based attributes. We train these models with clickthrough data from Adobe Stock search queries an...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

WCL2R: A Benchmark Collection for Learning to Rank Research with Clickthrough Data

In this paper we present WCL2R, a benchmark collection for supporting research in learning to rank (L2R) algorithms which exploit clickthrough features. Differently from other L2R benchmark collections, such as LETOR and the recently released Yahoo!’s collection for a L2R competition, in WCL2R we focus on defining a significant (and new) set of features over clickthrough data extracted from the...

متن کامل

Robust Error Detection: A Hybrid Approach Combining Unsupervised Error Detection and Linguistic Knowledge

This article presents a robust probabilistic method for the detection of context-sensitive spelling errors. The algorithm identifies lessfrequent grammatical constructions and attempts to transform them into more-frequent constructions while retaining similar syntactic structure. If the transformations result in lowfrequency constructions, the text is likely to contain an error. A first unsuper...

متن کامل

Fundamental Frequency Modeling for Speech Synthesis Based on a Statistical Learning Technique

This paper proposes a novel multi-layer approach to fundamental frequency modeling for concatenative speech synthesis based on a statistical learning technique called additive models. We define an additive F0 contour model consisting of long-term, intonational phrase-level, component and short-term, accentual phrase-level, component, along with a least-squares error criterion that includes a re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010